Path-Normalized Optimization of Recurrent Neural Networks with ReLU Activations
نویسندگان
چکیده
We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations. On several datasets that require capturing long-term dependency structure, we show that path-SGD can significantly improve trainability of ReLU RNNs compared to RNNs trained with SGD, even with various recently suggested initialization schemes.
منابع مشابه
Spectrally-normalized Margin Bounds for Neural Networks
We present a generalization bound for feedforward neural networks with ReLU activations in terms of the product of the spectral norm of the layers and the Frobenius norm of the weights. The key ingredient is a bound on the changes in the output of a network with respect to perturbation of its weights, thereby bounding the sharpness of the network. We combine this perturbation bound with the PAC...
متن کاملE-swish: Adjusting Activations to Different Network Depths
Activation functions have a notorious impact on neural networks on both training and testing the models against the desired problem. Currently, the most used activation function is the Rectified Linear Unit (ReLU). This paper introduces a new and novel activation function, closely related with the new activation Swish = x ∗ sigmoid(x) (Ramachandran et al., 2017) [14] which generalizes it. We ca...
متن کاملSolving Linear Semi-Infinite Programming Problems Using Recurrent Neural Networks
Linear semi-infinite programming problem is an important class of optimization problems which deals with infinite constraints. In this paper, to solve this problem, we combine a discretization method and a neural network method. By a simple discretization of the infinite constraints,we convert the linear semi-infinite programming problem into linear programming problem. Then, we use...
متن کاملUniversal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations
This article concerns the expressive power of depth in neural nets with ReLU activations and bounded width. We are particularly interested in the following questions: what is the minimal width wmin(d) so that ReLU nets of width wmin(d) (and arbitrary depth) can approximate any continuous function on the unit cube [0, 1] aribitrarily well? For ReLU nets near this minimal width, what can one say ...
متن کاملConvergence Analysis of Two-layer Neural Networks with ReLU Activation
In recent years, stochastic gradient descent (SGD) based techniques has become the standard tools for training neural networks. However, formal theoretical understanding of why SGD can train neural networks in practice is largely missing. In this paper, we make progress on understanding this mystery by providing a convergence analysis for SGD on a rich subset of two-layer feedforward networks w...
متن کامل